Data augmentation and language model adaptation

نویسندگان

  • David Janiszek
  • Renato De Mori
  • Frédéric Béchet
چکیده

A method is presented for augmenting word n-gram counts in a matrix which represents a 2-gram Language Model (LM). This method is based on numerical distances in a reduced space obtained by Singular Value Decomposition (SVD). Rescoring word lattices in a spoken dialogue application using an LM containing augmented counts has lead to a Word Error Rate (WER) reduction of 6.5%. By further interpolating augmented counts with the counts extracted from a very large newspaper corpus, but only for selected histories, a total WER reduction of 11.7% was obtained. We show that this approach gives better results than a global count interpolation for all histories of the LM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data augmentation and language model adaptation using singular value decomposition

A new augmentation method for counts to be used in language modeling is presented. It is based on word representations in a reduced space obtained with Singular Value Decomposition. A contribution to a count for a linguistic event x is obtained from the counts of observed events smoothed with a function of their distance from x. Experimental results on a spoken dialogue corpus show the performa...

متن کامل

Integrating MAP and linear transformation for language model adaptation

This paper discusses the integration of various language model (LM) adaptations. Ways of integrating Maximum A Posteriori (MAP) adaptation and linear transformation of bigram probability vectors are introduced and evaluated. This method leads to little improvements for adaptation corpora of less than 15,000 words. Another method, based on a data augmentation technique by means of a distance bet...

متن کامل

A Common Case of Jekyll and Hyde: The Synergistic Effect of Using Divided Source Training Data for Feature Augmentation

Feature augmentation is a well-known method for domain adaptation and has been shown to be effective when tested on several NLP tasks (Daume III, 2007). However, a limitation of the method is that it requires labeled data from the target domain and very often such data is unavailable. In this paper, we propose to use training data selection to divide the source domain training data into two par...

متن کامل

Persian Adaptation of Enhanced Milieu Teaching for Iranian Children With Expressive Language Delay

Objectives: This study aimed at adapting and examining the applicability of the Teach-Model-Coach-Review model of the enhanced milieu teaching (EMT) approach for improving Iranian mothers’ language strategies while interacting with their toddlers with expressive language delay. Methods: In a single-subject multiple-baseline across-behavior study, the mothers of 3 toddlers with expressive langu...

متن کامل

The Intersemiotic Study of Translation from Page to Stage: The Farsi Translation of Macbeth for Stage Adaptation from the Perspective of Peirceʼs Model

Intersemiotic translation, which can happen in the process of the translation of drama for theatre, can turn more complicated when the verbal sign system of drama has already undergone interlingual translation. The purpose of this study is to find the intersemiotic changes of translation from page to stage and to show the changes of indexical, iconic, and symbolic signs in the process of inters...

متن کامل

EFL Classroom Discourse in Iranian Context: Investigating Teacher Talk Adaptation to Students’ Proficiency Level

How language teachers talk is a key factor in organizing and facilitating learning specifically in language classrooms where the medium of instruction is also the subject matter. This study aimed to examine the extent and ways of teacher talk adaptation to students’ proficiency levels in the Iranian EFL context. Two EFL teachers who were teaching three different proficiency levels were observed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001